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Abstract, 

In  [ASS],  Abrahamson  presented  a  solution  to  the 
randomized  consensus  problem  of  Chor,  Israeli 
and  Li  [CIL87],  without  assuming  the  existence  of 
an  atomic  coin  flip  operation.  This  elegant  algo¬ 
rithm  uses  unbounded  memory,  and  has  expected 
exponential  running  time.  In  [AH89],  Aspens  and 
Herlihy  provide  a  breakthrough  polynomial-time 
algorithm.  However,  it  too  is  based  on  the  use 
of  unbounded  memory.  In  this  paper,  we  present 
a  solution  to  the  randomized  consenr  us  problem, 
that  is  bounded  in  space  and  runs  in  polynomial 
expected  time. 


1  Introduction 


1.  Consistency  :  No  two  processes  decide  on  dif¬ 
ferent  values; 

2.  Validity  :  If  all  processes  have  the  same  ini¬ 
tial  value,  then  processes  decide  on  that 
value. 

3.  Wait- freeness:  Each  process  is  guaranteed 
to  decide  after  a  finite  number  of  steps,  in¬ 
dependently  of  other  processes. 

In  a  shared  memory  in  which  only  atomic  read 
and  write  operations  are  allowed  there  is  no  de¬ 
terministic  solution  to  the  problem.  This  result 
was  directly  proved  by  [AG88,  CIL87,  LA87]  and 
implicitly  can  be  deduced  from  [DDS87.  FLP85]. 
Herlihy  [H88]  presents  a  comprehensive  study  of 
the  problem,  and  of  its  implications  on  the  con¬ 
struction  of  many  synchronization  primitives. 


The  Consensus  Problem  in  shared  memory  en¬ 
vironment  is  that  of  providing  an  algorithm,  by 
which  n  processes,  running  asynchronously  and 
communicating  via  shared  memory,  can  agree  on 
a  value.  Loosely  speaking,  the  algorithm  should 
have  the  following  properties: 
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A  randomized  solution  to  the  consensus  prob¬ 
lem  is  one  in  which,  rather  than  being  guaranteed , 
it  is  only  expected  that  the  number  of  steps  until 
a  process  decides  is  finite,  that  is,  property  (3) 
above  is  replaced  by: 

3.  Finite  expected  waiting:  The  expected  num¬ 
ber  of  steps  until  a  process  decides  is  finite. 

Such  an  algorithm,  provides  a  basis  for  construct¬ 
ing  novel  universal  synchronization  primitives, 
such  as  the  fetch  and  cons  of  [H88],  or  the  sticky 
bits  of  [P89] . 

Chor,  Israeli,  and  Li  [CII.87]  were  the  first  to 
provide  a  time-efficient  randomized  solution  to 
the  problem,  using  bounded  size  memory.  Their 
solution  wao  based  on  the  availability  of  a  pow¬ 
erful  atomic  com  flip  operation.  In  [A88],  Abra¬ 
hamson  presented  a  first  solution  not  assuming 


the  existence  of  such  an  operation.  However, 
this  elegant  algorithm  uses  unbounded  memory, 
and  has  exponential  expected  running  time.  The 
question  was  thus  raised: 

Does  there  exist  an  algorithm  that  is 
polynomial  in  running  lime  anil  bounded 
in  memory  size? 

An  exponential  time  algorithm  can  be  derived 
from  that  of  [A88]  (see  [ADS89])  using  a  transfor¬ 
mation  based  on  the  concurrent  tune  stamp  sys¬ 
tem  techniques  of  [DS89].  Aspens  and  llerlihy 
(in  [A H 88] )  provide  a  breakthrough  algorithm 
that  runs  in  polynomial  expected  time.  Unfor¬ 
tunately,  it  is  based  on  the  use  of  unbounded  size 
memory  in  a  “stronger''  way  than  in  [A88j.  Since 
for  reasons  presented  in  the  sequel,  t  here  seems  to 
be  no  transformation  of  [AH88]  to  a  bounded  pro¬ 
tocol  using  concurrent  time  st  amping  techniques, 
the  above  question  remained  unanswered. 

In  this  paper,  we  present  a  solution  to  t lie 
randomized  consensus  problem  that  both  runs 
in  polynomial  expected  time  and  is  bounded  in 
memory  size. 

The  main  reason  for  the  simplicity  in  providing 
an  exponential  time  randomized  consensus  algo¬ 
rithm  using  bounded  space,  is  that  all  one  need 
provide  are  actually  the  properties  of  consistency 
and  non-triviality.  The  wait-freeness ,  i.e.  expo¬ 
nential  expected  *  uniting  time,  is  (though  hard 
to  analyze)  just  the  result  of  the  exponentially 
small  probability  that,  processes  flipping  indepen¬ 
dent  coins,  will  come  up  with  the  same  value.  To 
provide  the  former  two  properties,  one  need  only 
create  a  locking  mechanism  that  will  provide  ex¬ 
clusion,  before  allowing  processes  to  decide  on 
a  value.  Su<-h  unbounded  locking  mechanisms 
are  based  on  time  stamping  concurrent  lock  set¬ 
ting  events,  a  process  that  has  been  shown  to  be 
modularly  replaceable  using  bounded  concurrent 
time  stamp  systems. 

In  order  to  obtain  an  algorithm  that  runs  in 
expected  polynomial  time,  as  [A  1188],  one  must 
limit  the  ability  of  the  adversary  to  create  non¬ 
decision  scenarios  while  processes  try  to  lock  for 
values.  A  way  rf  !  '•«"  is  ~y  h,v»r.£  r.  ;,iw 
cess’  decision  to  attempt  to  lock  for  a  value,  on  a 
function  of  more  than  just  one  independent  local 
coin  toss,  preferably  on  many  coin  tosses  by  all 


processes  This  e\;u  t  uh  a  i.s  abstracted  into  the 
notion  of  creating  a  shared  ylobal  com  [CMS85]. 
Since  attempts  to  lock  for  a  value  based  on  the 
shared  coin  could  still  fail  (because  as  shown  in 
[AII88],  one  cannot  create  a  perfect  coin)  re¬ 
peated  global  coin  tosses  are  needed.  When  im¬ 
plementing  multiple  coin  tosses,  one  must  re¬ 
member  that  processes  run  at  different  paces,  so 
one  should  take  care  to  a  prevent  mixups  be¬ 
tween  locations  in  memory  used  for  new  and  old 
coins,  and  b.  provide  independence  among  shared 
coin  flips  (this  means  preventing  processes  in  old 
coin  toss  phases,  from  causing  attempts  of  pro¬ 
cesses  in  later  coin  tosses  *o  fail,'.  The  algorithm 
uses  an  unbounded  strip  of  coins,  where  for  each 
toss  a  separate  set  of  memory  locations  is  al¬ 
located;  this  allows  to  distinguish  between  coin 
tosses,  and  thus  to  meet  the  above  requirements. 

Summing  the  above,  in  achieving  polynomial 
expected  time,  unboundedness  is  used,  not  to  or¬ 
der  any  two  specific  coin  flipping  events  by  the 
relative  times  in  which  they  occurred  (a  prop 
erty  provided  by  concurrent  time  stamping),  but 
by  how  many  coin  flipping  events  is  one  process 
trailing  behind  the  other. 

In  [AH88],  in  addition  to  ihe  above  use  of  un¬ 
bounded  memory,  the  weak  shared  coin  flip  con¬ 
struction  requires  that  each  coin  location  in  the 
unbounded  strip  be  in  itself  unbounded.  Finally, 
1  r  i  se  of  a  random  walk  to  create  the  shared 
cc  ised  on  a  snapshot  view  of  memory.  The 
impi  entation  of  this  snapshot  operation  also 
uses  unbounded  counters. 

The  main  contribution  of  our  paper  is  an  im¬ 
plementation  that  achieves  the  properties  of  the 
coin  strip  using  bounded  memory.  It  is  based  on 
a  technique  for  maintaining  a  “shrunken”  version 
of  the  strip,  effectively  pulling  toget  her  processes 
that  opened  a  gap  between  one  another.  In  addi¬ 
tion.  it  is  shown  how  to  perform  the  random  walk 
using  only  bounded  coin  locations.  Finally,  our 
algorithm  is  based  on  the  availability  of  a  mem¬ 
ory  piimitiv.,  3,i  ..birh  a  «n»r>shot  scan  can  be 
performed.  We  show  how  to  implement  such  a 
primitive  boundedly. 

t  ol  the  paper  is  organized  as  follows. 
In  Section  2  a  scannahle  memory  primitive  is  de¬ 
fined  and  constructed.  In  Section  S  a  bounded 
memory  implementation  of  a  weak  shared  coin 


is  presented.  In  Section  4  the  implementation  of 
the-  voin  strip  is  presented.  We  introduce  a  to¬ 
ken  game  capturing  the  properties  of  the  strip. 
A  shrunken  version  of  the  game  is  shown  to  pro¬ 
vide  the  same  properties,  and  is  then  translated 
into  a  game  on  a  weighted  graph.  Finally,  a  con¬ 
current  implementation  of  the  game  on  the  graph 
is  presented.  Section  5  shows  how  bounded  size 
strips  of  coins  can  be  manipulated  based  on  the 
concurrent  graph  game.  All  the  unbounded  con¬ 
structs  of  the  [AH88]  type  algorithm  presented  in 
Section  5,  are  then  replaced  by  the  bounded  ones, 
providing  the  desired  solution.  In  Section  6,  an 
outline  of  the  correctness  proof  of  the  algorithm 
is  presented.  Due  to  lack  of  space,  some  of  the 
proofs  are  omitted. 

2  Snapshot  Scanning 

2.1  Definitions 

A  Scannable  Memory  V  is  an  abstract  data  type 
shared  among  n  concurrent  and  completely  asyn¬ 
chronous  processes.  There  are  two  operations 
that  any  process  can  execute  on  V,  a  write  oper¬ 
ation  and  a  scan  operation.  As  discussed  below, 
it  is  not  assumed  that  these  operations  are  nec¬ 
essarily  waitfree  [H88,  AG88]. 

Assume  that  each  process’  program  consists, 
among  other,  of  the  above  two  operations,  whose 
execution  generates  a  sequence  of  elementary  op¬ 
eration  executions,  totally  ordered  by  the  pre¬ 
cedes  relation  (of  [L86a,  L86c]  denoted  “  — »•  ”). 
The  following 

IT,1'1  —  S,f1]  —  W,[2]  —  W}3] 

_ ’[2]  cl3l  ct4l  

°i  ‘-’t 

is  an  example  of  such  a  sequence  by  process  i, 
where  denotes  process  i’s  kth  execution  of  a 
write  operation,  and  5,^  the  kth  execution  of  a 
scan  operation  (the  superscript  [A-]  is  used  for  no¬ 
tation,  and  is  not  visible  to  the  processes).  One 
should  bear  in  mind  that  the  asynchronous  na¬ 
ture  of  the  operations  allowo  cituntions  where 
a  scan  overlaps  many  consecutive  write  opera¬ 
tions  of  other  processes.  Also,  several  consecutive 
scans  could  possibly  be  overlapped  by  a  single 
write  operation. 


Let  —  -*•  be  the  can  affect  relation  of  [L86a, 
L86c],  A  global  time  model1  of  operation  exe¬ 
cutions  is  assumed  (see  [1, 86a,  B88]).  The  follow¬ 
ing  definition  attempts  to  capture  the  notion  that 
a  possible  effect  of  one  operation  on  the  shared 
memory  (such  as  the  writing  of  a  value),  existed 
at  a  point  in  global  time  where  the  other  was 
being  executed. 

Definition  2.1.  A  write  operation  execution 
potentially  coexists  with  another  operation 
execution  ojb ^  (O  stands  for  either  a  scan  or 
write )  if  W}a]  — *■  Ojb ^  and  there  does  not  exist 
a  W^a  ^  such  that  — *■  ^  oj^. 

With  each  write  operation  execution  a 

value  written  into  V  is  associated.  A  scan 
operation  returns  a  view,  a  set  of  values  f  = 

The  following  requirement  is  made  to  assure  that 
the  snapshot  view  v  returned  by  is  a  mean¬ 
ingful  one,  namely,  returning  the  values  of  write 
events  immediately  before  or  concurrent  with  the 
scan,  and  not  just  any  possible  set  of  values. 

PI  regularity.  For  any  value  in  v  of  5^ . 

W}°^  potentially  coexisted  with  Sjb^. 

The  above  eliminates  uninteresting  trivial  so¬ 
lutions  and  introduces  a  measure  of  liveness  into 
the  system.  More  importantly,  it  implies  that 
the  behavior  of  the  scannable  memory  is  as  if 
it  consists  of  disjoint  registers,  one  per  process, 
which  the  designated  process  can  write,  and  all 
can  read.  This  is  very  different  from  the  behav¬ 
ior  of  multi  reader  multi  writer  atomic  registers, 
where  the  latest  write  of  any  process  erases  the 
values  written  by  others. 

Though  a  scan  as  above  is  sufficient  for  many 
applications,  one  is  interested  in  a  scan  that  re¬ 
turns  an  “instantaneous”  view  of  memory,  that 
i:.,  bavine  'he  following  stronger  property. 

1  Implying  that  for  any  two  operation  executions, 
a  - -  b  or  6 - -  a. 

2 Initialization  and  safety  are  similar  to  A r toms  HO-3 
for  single-writer  atomic  registers  [L86b] 


P2  snapshot :  For  any  two  values  and 

in  v  of  Sj(c\  IV, W  potentially  coexisted  with 
ll’^,  or  IV;^  potentially  coexisted  with 
.  or  both. 

Though  Pt-2  return  values  that  could  have  been 
returned  by  an  instantaneous  scan,  they  do  not 
imply  that  scan  operations  of  all  processes  are 
serializable.  Moreover,  they  do  not.  imply  that 
later  scans  will  obtain  later  snapshot  views.  The 
following  property  is  therefore  added,  to  formal¬ 
ize,  together  with  Pl-2.  the  idea  that  all  scans 
are  serializable. 

P3  scan  serialnability:  Let  5,^  and  sj.b  ^  be  any 

pair  of  scans.  Let  and  i  t  {l..n}, 

denote  the  corresponding  values  returned  by 
the  two  scans.  Then  either  for  every  i  £ 
{i..n},  a,-  <  a',  or  for  every  i  £  {I  n}.  «'  < 
a,. 

For  the  purposes  of  the  applications  in  this  pa¬ 
per,  it  is  not  required  that  both  scan  and  write 
operations  be  waitfree  [H88,  AG88].  Since  every 
process’  execution  sequence  will  be  an  alternating 
sequence  of  scan  followed  by  ivnte,  it  will  actu¬ 
ally  suffice  that  in  any  infinite  system  execution, 
there  exists  a  new  write  operation  infinitely  of¬ 
ten.  In  the  full  paper,  a  formal  treatment  of  this 
property  is  provided. 

2.2  Bounded  Implementation  of 
Scannable  Memory 

The  implementation  is  based  on  the  use  of 
ungle-writer-multi-reader  and  two-wntcr-two- 
reader  atomic  registers.  The  scannable  mem¬ 
ory  V  will  consist  of  n  single-wnter-multi-reader 
atomic  registers  Vi,  i  £  { I . . n } ,  each  Vi  written 
by  process  i  and  read  by  all.  In  addition,  for  ev¬ 
ery  pair  of  processes  i  and  j,  a  pair  of  two-wnter- 
firo-reader atomic  registers  and  AJt  are  main¬ 
tained  3.  Bounded  constructions  of  such  registers 
from  weaker  primitives  are  shown  in  [B187,  L86b, 
IL88.  BP87,  N87,  SAG87,  LV88,  DS89].  Register 
.4,j  is  used  by  i  to  inform  j  that  it  has  updated 
Vi ,  and  by  j  to  mark  that  it  has  read  V}.  To 

1  To  9ave  in  the  complexity  of  constructing  multi  writer 
registers,  the  arrow*  technique  of  [Df!S8S]  can  be  used. 


simplify  the  proofs  (and  only  for  this  purpose), 
an  alternating  bit  field  is  assumed  to  be  added 
to  each  register  V',.  such  that  two  values  written 
in  consecutive  writes  by  the  same  process,  always 
differ. 

The  main  idea  behind  the  implementation  of 
the  scan  and  write  operations  is  as  follows.  A 
value  of  1  in  register  ,4 3 ,  denotes  an  “arrow’’ 
pointing  from  j  to  i,  a  value  of  0  denotes  an  arrow 
from  i  to  j .  To  scan  the  memory,  a  process  i  will 
direct  all  arrows  Ajt  towards  other  processes,  per¬ 
form  a  collecting  of  values  followed  by  a  collecting 
of  arrows,  and  repeat  these  two  collections  again. 
If  the  values  have  not  changed  and  nc  arrow  has 
been  redirected  towards  it.  process  i  has  collected 
a  snapshot  in  its  second  read  of  every  register.  4 
To  write  a  value,  a  process  j  directs  the  arrows 
-4j{  towards  any  possibly-scanning  process,  noti¬ 
fying  that  it  has  started  a  write,  then  writes  the 
value.  The  following  are  the  write  and  scan  pro¬ 
cedures  of  a  process  i.  where  wc  use  the  notation 
j  £  { 1 . . n }  —  {/}  to  denote  that  indexing  is  per¬ 
formed  in  some  arbitrary  order. 

procedure  write  (value)-, 

begin 

for  j  £  {1  .n)  —  {/}  do  .41;  1  od; 

Vi  :=  value ; 

end  write ; 


Assume  that  a  process,  during  the  execution  of 
the  scan  operation,  has  seen  no  arrows  redirected, 
and  both  values  being  the  same.  It  can  thus  de¬ 
duce  that  no  process  whose  corresponding  value  it 
returns,  could  have  performed  its  following  write, 
completely  before  any  of  the  other  writes  whose 
values  it  returns.  T  he  reason  is  that  if  that  were 
the  case,  the  writing  process  would  have  turned 
the  arrow  and  the  scan  would  have  gone  through 
another  round 

function  scan 
begin 

L:  for  j  £  {l..n}  —  { r }  do  ,4;,  ::=  0  od: 

for  j  £  {l.n}  -  {i)  do  VT(j]  :=  V)  od: 

for  j  £  { 1 . . ri }  —  {;}  do  l’2[/j  od; 

for  j  £  {l  .n}  -  {/(  do  ,4[y]  AJt  od: 

4 The  two  phases  of  value-collecting  are  also  used  to 
simplify  the  proofs. 


i{(3j)(A\j]=  l  V  Vl[j]  ±  V2\j}) 
then  goto  L  fi; 
return  12; 
end  scan: 


\\'Jb].  nor  w}a\  potentially  coexisted  with  the 
other.  W.l.o.g,  it  must  be  that 

(3fTl[a'1)(Wr1[al  —  w}a']  —  It''161). 


Though  the  write  operation  is  waitfree,  the  scan 
operation  is  of  course  not,  because  scans  may  re¬ 
peatedly  be  forced  to  return  to  line  L.  However, 
scans  do  not  wait  for  other  scans,  and  the  above 
can  only  happen  on  account  of  repeated  execution 
of  new  write  operations  by  some  process.  Thu^,  it 
can  be  proven  that  the  implementation  provides 
the  type  of  progress  described  in  the  previous  sec¬ 
tion. 

The  following  is  the  main  core  of  the  proofs  of 
properties  Pl-3.  The  notation  rlj[61(Vjj)  for  ex¬ 
ample,  will  denote  the  first  read  in  scan  operation 
execution  of  register  \\j. 

Lemma  2.1.  For  any  value  vj“^  in  U  of  Sjb\ 
IT,1,1  potentially  coexisted  with  Sj 61 . 

Proof  Assume  by  way  of  contradiction  that  the 
claim  does  not  hold.  There  must  thus  exist  some 
value  in  “of  sjbK  such  that  ->(  — «-  sj^l 

or  (3IV;.Ia'I)(jrW  __  WW)  __  $M).  By  the 
assumption  of  global  time,  ->(W''.lal  --- ►  5;-61)  im¬ 
plies  — ►  w/al ,  which  by  atomic  register  ax¬ 

iom  Bf  of  [L86c],  it  cannot  be  that  v-a^  was  re¬ 
turned.  Thus,  the  second  condition  must  hold, 
which  by  the  scan  algorithm  implies 

«,[aI(V,)  —  u^Vi)  —  r2j6l(V.) 

where  t?/al  was  returned  in  r2;.6l(Vj),  a  contradic¬ 
tion  to  atomic  register  axiom  Bf  of  [L86c],  ■ 


By  the  scan  algorithm,  (Ay*)  — *■  (T,). 
Since  v}a1  and  not  vja  1  was  returned  in  r|cl(V't), 

r|cl(Vi)  — *•  wja  \  Vi).  Because  W^a  1  — wfbK 
it  (  must  be  that 

«■/“  1  ( V, )  — *•  u/Jb\Ajk)  — *■  mj6l(U,),  Also,  be¬ 
cause  vjb‘  was  returned  in  r  jfl (Vj),  it  is  must  be 
the  case  that  wjb\Vj)  — *■  rjcl(Vy).  Again  by  the 

scan  algorithm,  r,[cl(Vy)  — *■  r^^Ay*).  From  the 
above,  by  the  transitivity  of  — »■  ,  it  follows  that 


Since  in  wj  (Ajk)  a  value  of  0  was  written,  t his 

value  must  have  been  read  in  r^(Ajt),  a.  contra¬ 
diction  to  the  termination  condition  of  the  scan 
algorithm.  ■ 

Using  similar  arguments  the  next  two  lemmas 
prove  PS.  The  following  lemma  establishes  that 
in  the  two  reads  of  any  scan  operation  execution, 
the  value  written  in  the  exact  same  write  is  re¬ 
turned. 

Lemma  2.3.  In  any  scan  operation  execution 
,  for  any  value  t>/sl  in  vja^  was  read  in 
both  rl^1  and  r2^ . 

Proof  Assume  by  way  of  contradiction  that  the 
above  does  not  hold.  Since  the  values  read  in 
rl]cl  and  r2^  must  be  the  same,  and  two  con¬ 
secutive  writes  have  different  toggle  bit  values,  it 
must  be  that  for  1  and  returned  in  rl^1 
and  r2|cl  respectively,  there  must  exist  a  write 


This  implies  PI,  the  following  proves  P2  is  met.  operation  execution  such  that 


Lemma  2.2.  For  any  two  values  vja^  and  vjb^  in 
v  of  Sjf\  fU,101  potentially  coexisted  with  or 

Wj^  potentially  coexisted  with  w/al  or  both. 

Proof  Assume  by  way  of  contradiction  that  the 
claim  does  not  hold.  There  must  thus  exist  two 
values  t//al  and  vjb^  in  v  of  such  that  neither 


In  a  manner  similar  to  that  of  the  former  proof, 
by  the  ordering  of  reads  of  A and  V, ,  it  must  be 
that 

W'W)  —  rlW(VS) 


w,K1(K) 


This  implies  that  the  value  of  0  written  in 
u  /a'(  .4,^)  must  have  been  read  in  r^(.4t-/t),  a  con¬ 
tradiction  to  the  scans  termination  condition.  ■ 

Lemma  2.4.  Let  and  SyC '  be  any  pair  of 

scans.  Let  and  v}a'\  i  G  {l..n}.  denote  the 

corresponding  values  returned  by  the  two  scans. 
Then  either  for  every  i  6  {  1 . . ri } .  n,  <  o',  or  for 
every  i  G  { 1 . . n } .  aj  <  a,. 

Proof  Assume  by  way  of  contradiction  that  the 
claim  does  not  hold.  There  must  thus  exist  values 
v,  and  i  f  ‘  in  and  vt  and  vj  in  v1'  J 

such  that  a  <  a'  and  6  >  6' . 

Lemma  2.3  implies  that  the  value  returned  in 
both  reads  of  a  scan  operation  execution  is  of  the 
same  write  operation.  In  the  scan  operation  exe¬ 
cution  of  t/,  Since  in  ^  was  returned, 

u'/a'(V’i)  — -  rl]'  ^(I'p  Since  in  r2^(Vj), 

was  not  returned.  r‘2jf  ).  By  the 

order  of  reads  in  a  scan  it  thus  follows  that 

—  —  u>jV>V 

By  similar  arguments,  regarding  the  scan  opera- 
tion  execution  of  x. 

r lW(V-) 

— ►  r2W(V;)  —  u.>'!(l,). 

By  transitivity,  the  combination  of  these  two  se¬ 
quences  of  operation  executions  contradicts  the 
antisymmetry  property  of  the  partial  order  — *-  . 


3  A  Bounded  Implementation  of  a 
Shared  Coin 

The  implementation  of  the  weak  shared  coin  is 
based  on  the  random  walk  technique  of  [AH88]. 
For  lack  of  space  we  explain  only  the  modification 
allowing  to  bound  the  size  of  the  counters  used  to 
implement  the  coin.  The  main  idea  of  the  modi¬ 
fication  used  is  rather  straightforward.  The  coin 
implemented  by  the  random  walk  is  weak,  that 


is,  involves  a  small  probability  that  processes  will 
disagree  on  the  coin’s  outcome.  Thus,  one  can  al¬ 
low  a  process  to  always  decide  heads  in  case  its 
counter  overflows,  as  long  as  the  probability  of 
this  event  can  be  absorbed  into  the  probability 
of  processes  disagreeing  on  the  outcome. 

Let  c  —<  c„  >  be  an  array  of  coun¬ 

ters  implementing  a  shared  coin.  Each  counter 
c,  has  values  in  the  range  {  —  (m  +  l)..(m  +  1)}, 
written  by  its  corresponding  process  i.  Let 
walk.value(c)  —  ci  The  following  are  thus 

the  functions  of  process  t,  for  determining  if  the 
random  walk  has  led  to  a  coin  value,  and  for  per¬ 
forming  a  step  in  the  random  walk  by  process  t. 

function  coin.value  (c); 

begin 

1:  if  c,  ^  {  —  m..m}  then 
return  heads  fi; 

2:  if  walk.value(c)  >  6  ■  n  then 
return  heads 

3:  elseif  walk^value(c)  <  — 6  ■  n  then 

return  tails 

4:  else  return  undecided  fi  fi; 

end  coin~value\ 

procedure  walk.step. 
begin 

if  flip=  heads  then  c;  :=  c*  +  1 
else  Ci  :=  C{  -  1  fi; 

end  walk.step: 

Lemma  3.1  (Aspnes  and  Herlihy).  The 
probability  that  two  processes  will  disagree  on  the 
coins  outcome  is  (6  —  l)/(2<5). 

Lemma  3.2  (Aspnes  and  Herlihy).  The 

expected  number  of  slips  until  the  com  is  decided 
is  (6  +  1 )2 n 2 . 

Look  at  a  random  walk  starting  from  0  with 
barriers  at  b  and  —  b,  consisting  of  the  steps: 

hi  ,S2, . . .  fi,  G  {—  1.  +1 }  for  all  i. 

The  following  is  a  bound  on  the  probability  that 
after  m  steps,  none  of  the  barriers  was  crossed. 
Define 

6,\<b 

Clearly,  the  desired  probability  is  bounded  from 
above  by  Sm.  Thus, 


m 

Sm  =  Prob  | 

.  1  =  1 


Lemma  3.3.  Let  tn  =  (f(b)b)'.  for  some  func¬ 
tion  f .  then  then  rusts  a  constant  C.  such  that 
Sm  5  77T)  (proof  ommited). 

Based  on  the  above,  one  can  prove  that  by 
choosing  m  to  be  large  enough,  the  probability 
that  the  adversary  can  force  processes  to  disagree 
because  of  the  deterministic  choice  of  heads  in 
case  of  counter  overflow,  is  negligible,  as  formal¬ 
ized  by  the  following  lemma: 

Lemma  3.4.  There  exists  a  constant  C  such  that 
the  probability  that  in  the  random  walk  generated 
by  a  sequence  of  erecutions  of  the  algorithm  on  a 
guen  coin  c. 

C  S  77 

Prob  [|c,d  >  ml  <  - 7= — . 

s/m 

4  The  Round?  Strip 

In  this  section  a  method  is  shown  for  replacing 
the  unbounded  strip  of  round  locations  required 
by  the  algorithm  of  [AH88],  by  a  bounded  con¬ 
struct  The  important  observation  is  that  this 
algorithm  utiliz.es  the  rounds  strip  in  a  very  re¬ 
stricted  way.  Informally 

Observation  1.  There  rusts  a  constant  K  such 
that  at  any  point  in  the  computation: 

1.  The  actions  performed  by  any  process  are 
not  affected  by  values  of  processes  that  are 
strictly  more  than  K  rounds  behind  it. 

2.  If  a  process  performs  round  r,  and  cannot 
decide,  then  there  is  a  disagreement  about 
the  value  of  the  shared  com  of  round  r  —  K . 
This  implies  that  when  this  process  proceeds 
to  round  r+  1,  it  can  withdraw  its  contri¬ 
bution  to  the  coin  of  round  r  —  A’,  without 
affecting  the  performance  of  the  algorithm. 

Thus,  a  complete  picture  of  the  rounds  in  which 
processors  are  located  is  not  necessary;  rather,  it 
suffices  to  maintain  a  '‘compressed”  description  of 
tlif  distances  between  these  round  numbers,  and 
to  save  processes'  contributions  to  the  K  latest 
coins  that  were  flipped.  The  following  subsec¬ 
tions  present  the  data  structure  used  to  maintain 
these  distances  concurrently. 


In  the  :.  xt  subsection,  a  simple  game  is  pre¬ 
sented  in  order  to  make  precise  the  notion  of 
“compression”  mentioned  above.  Then,  in  Sec¬ 
tion  4-2,  we  show  how  to  store  and  play  t  his  game 
using  a  directed  weighted  graph.  In  order  to  sim¬ 
plify  the  presentation  this  game  is  sequential.  In 
Section  4-3,  a  data  structure  that  implements  the 
game  on  the  graph  is  defined,  as  well  as  the  pro¬ 
cedures  for  playing  the  game  on  this  graph  con- 
curren  tly. 

The  main  problem  is  how  to  maintain  the  rele¬ 
vant  values  using  bounded  space,  given  that  pro¬ 
cesses  are  asynchronous.  For  example,  it  could 
he  that  process  will  start  flipping  a  coin  in  a 
round  r  when  round  r  is  maximal,  and  during  its 
coin  flipping  other  processes  will  move  to  higher 
rounds,  that  are  an  unbounded  number  of  coin 
flips  ahead. 

4.1  The  Game 

Imagine  the  changes  to  the  processes'  round  num¬ 
bers  as  a  game  played  on  the  natural  numbers 
(viewed  as  an  infinite  ordered  set  of  points): 

Each  processor  controls  a  token,  placed  at  a 
specific  point,  initially  0.  Denote  by  r,-  the  loca¬ 
tion  of  i’s  token.  Each  processor  can  perform  the 
step  moveJokeni  that  places  its  token  at  place 
r,  +  1.  The  game  is  a  (possibly  infinite)  sequence 
of  the  form  moveJokeni, ,  moveJokeni 2  ... 

At  any  stage  of  the  game,  the  collection  of 
tokens’  positions  forms  a  multi-set  of  integers, 
S  =  {»T , . . . ,  rn  }.  Let  ir  be  the  ordering  permuta¬ 
tion  of  5,  i.e.,  S  =  <  rw  (21  <  <  rir(n)}- 

Let  A"  be  some  fixed  constant.  We  now  intro¬ 
duce  two  transformations,  that,  when  applied  to 
the  set  S,  produce  a  “compressed”  representation 
of  it,  without  losing  important  information. 

Shrinking.  One  is  interested  in  the  exact  dis¬ 
tance  between  two  token  if  and  only  if.  the  dis¬ 
tance  between  them  is  less  than  A'.  The  goal 
of  t  he  first  transformation  is  to  “shrink  "  gaps  of 
length  strictly  larger  than  A",  to  be  of  size  A 
Informally,  shrinkfc(S)  is  a  new  set  S’,  in  which 
r*(rl)  remains  in  its  current  position,  wliik  any 
two  consecutive  tokens  (rT(i)  and  rT(,  +  1))  that 
are  more  than  A’  apart,  become  A  apart,  while 


4.2  Representation  as  a  Finite  Graph 


t lie  distance  between  tokens  that  are  less  than  A 
apart,  remain  inn  hanged 

Formally,  let  5  =  {r*(ll  <  ...  <  r*,,,)}.  Let 
gap,  =  rTli,  —  rr(I  +  i|.  for  1  <  i  <  n,  and  define 
shrink  k  ( 5)  =  {r^,,  <  ...  <  ri  ( } ,  (for  some 
parameter  A')  inductively  as  follows: 

(1)  =  rT(1). 

(2)  Assume  we  have  defined  r'  .,  then 

f»(l)  +  A'  if  gap,  >  A’ 
r'^+gapi  otherwise 

Intuitively,  any  "gap"  in  the  sequence,  whose 
length  is  strictly  larger  than  A’,  is  "shrunk"  to  be 
of  length  exactly  A’. 

The  shrunken  token  game  is  conducted  by  ex¬ 
ecuting  a  shrink k  on  the  set  of  token  places 
after  each  more Joken,,  step,  before  the  next 
move Jokeiii^,  step. 

Normalizing.  It  is  easy  to  see  that  after  apply¬ 
ing  shrinkfc  to  any  set  5,  the  distance  between 
the  maximal  element  and  the  minimal  element  is 
at  most  A  n.  To  compress  the  values  even  further 
they  are  normalized,  so  that  all  values  remain  in 
a  bounded  range. 

The  ordering  permutation  of  5'  =  shrinkn(S) 
is  still  t.  The  transformation  normalize  k(S') 
maps  each  element  r'  £  S'  to  (r,  —  r*(n))  +  A'n. 
That  is,  the  maximal  token(s)  is  positioned  at 
A  n.  and  the  rest  of  the  tokens  are  move  be¬ 
hind  it  wdtile  maintaining  the  distances  between 
tokens.  Notice  that  for  any  -et  S,  all  the  val¬ 
ues  in  n ormalizeKishrink^iS))  are  in  the  range 
[O..A'n]. 

The  normalized  shrunken  game,  is  conducted 
by  applying  shrink k  and  then  normalize k  to 
the  set  of  token  places  after  each  move Joken,, 
step,  before  the  next  move  Joken,, +  1  step. 

An  important  property  preserved  by  the  nor¬ 
malized  shrunken  game  is: 

Non-Passive  Shrinking.  For  any  two  token 
positions  r,  and  r}  in  a  state  of  the  game, 
s.t.  0  <  r,  —  r j  <  A’,  if  for  later  token  posi¬ 
tions.  r'  and  we  haver'—  rj  =  (r,  — r;)— 1. 
then  there  is  a  move  .token]  between  the  two 
states. 


Given  a  state  S  of  the  above  game,  we  define 
its  distance  graph  G(S),  as  follows:  G  is  a  di¬ 
rected  weighted  graph  with  nodes  V  =  {I  n}, 
corresponding  to  tokens,  one  per  process,  edges 
E  =  {(*. j)lr;  <  r, }  indicating  relative  order  of 
token  locations,  and  weights  w(i,j),  defined  for 
any  (i,  j)  £  E  as 


The  following  properties  of  the  distance  graph  G. 
are  implied  from  the  definition  of  the  normalized 
shrunken  token  game: 

1.  For  any  i  and  j  in  l  ’.  at  least  one  of  (i.j)  or 
( j ,  i)  is  in  A;  both  edges  are  in  E  if  and  only 
if  the  weight  of  both  is  0. 

2.  There  is  no  positive  cycle,  that  is.  a  cycle 
including  an  edge  {i.j)  with  >r(i,j)  >  0. 

3.  Let  P(i,j)  be  the  set  of  all  directed  simple 

paths  from  i  to  j.  For  every  path  f  £  P{i,j), 
let  W(f)  =  v)  €?  u'iu-v)-  follows  from 

the  above  propert  es  that  0  <  W{f)  <  A'  n. 

4.  For  any  two  directed  paths  and  fi  £ 

P{i,j),  either  Ib’(yTi)  =  or  there  ex¬ 

ists  an  edge  (u.v)  £  f  i  such  that,  w(u.v)  = 
K. 

5.  For  any  i  and  j,  such  that  P(i.j)  ^  0.  define 

dist(i,j)=  max  (lF(u?'))}, 

r'ePli.j) 

and  define  max_paths(i,  j)  to  be 

{¥>€  P(i,j)  |  W(f)  =  dist(i.j)} 

Then  W(f)  —  rj  —  r,  for  every  f  £ 
max.paths(i,  j). 

Let  inc(i,G )  be  defined  as  the  following  trans¬ 
formation  of  graph  G  for  a  given  i : 

for  all  j  gh  i  in  V7  do 
if  (;,  i)  £  G  and 

(3 k)((j,i)  £  max.paths(k.i))  then 
w(j,i)  :=  w(j,  i)  -  1  fi; 

•f  (A  j)  <E  G  and 


0  <  it  ( i.  j)  <  A  then 

ir(i.j)  :  =  u(i.j)  +  1  fi; 
if  u  (j.  i)  <  0  then 

E  =  E  -  {O'- ')}  U  { ( /.  j)}; 

u  (i.j)  =  -w(j,  i )  fi; 
od: 

Claim  4.1.  For  a  stale  S'  reached  from  state  S 
hi /  a  token. move  of  i  in  token  game  ,4.  G(S')  = 
mc(i.  G(S)). 

4.3  Implementation  of  the  Graph 

Property  (1)  of  the  distance  graph  implies  that 
the  weights  of  all  (undirected)  edges  s  ffice  to 
induce  the  directed  graph  structure.  'Idle  weights 
are  maintained  in  a  collection  of  e,[l..n]  of  edge 
i  f  miters,  one  per  each  (undirected)  edge  (e< [*]  is 
not  used).  Bach  pair  c,[j]  and  cy [i]  of  counters 
in  the  range  {0..3A— 1).  represents  two  pointers 
(of  i  and  j.  respectively)  to  a  cycle  of  size  3A'. 
By  incrementing  the  counter,  a  process  moves  its 
pointer  a  in  clockwise  direction  (all  arithmetics 
in  this  subsection  is  modulo  3  A). 

Assume  c,  [j]  -  [i]  <  e y  [i]  —  e,  [j]  then  the  edge 

is  (i.j).  and  u(i.j)  =  e,[_/]  -  ey [?] ,  and  vice  versa. 
Thus,  given  two  edge  counters  e,[j]  and  e; [i],  the 
existence  of  a  given  directed  edge  is  determined 
by  the  rule 

(i.j)  £  G  if  (e,[j]  -  ej [i])  <  (e;[»)  -  e,[/]) 

and  the  weight  u  (i.j)  of  the  edge  (i.j)  is  (e,[/j  — 
e,  [/]).  Note  that  if  e, [J]  =  c; [i],  then  we  have 
both  edges,  (i.j)  and  (j.i)  with  both  weights 
equal  to  0.  To  keep  the  weight  iv(i.j)  in  the  range 
(0  A},  a  process  t  does  not  mcrement  e,[j]  un¬ 
less  it  is  the  trailing  pointer,  or  it  leads  by  less 
than  A. 

bet  make.graph  lie  the  procedure  that,  given 
the  collection  of  all  edge  counters,  creates  a  graph 
representation,  as  described  above.  The  following 
procedure  is  thus  the  (possibly  concurrent)  imple¬ 
mentation  of  one  increment  move  on  the  graph  G . 

function  mc.graph( e\ [1 . .n] . .cn [1 . .n] ) ; 
hog  in 

G  make.graph(el[ l..n]..e„[l..n]); 

for  j  ;=  1  to  n  skip  i  do 
if  ((j.  t )  €  G  and 


(3  k)((j.  i)  £  max.paths(k ,  i)))  or 
((i.j)  £  G  and  w(i,j)  <  A)  then 
e,[j]  :=  +  1  mod  3 A 

fi; 

od; 

end; 

5  The  Algorithm 

Based  on  Observation  1  (Section  j),  if  a  pro¬ 
cess  advanced  A'  rounds  ahead  of  another,  it  can 
erase  its  contribution  to  the  trailing  process'  coin 
A  trailing  process  performing  next.com. value  us¬ 
ing  that  location  will  possibly  see  that  process' 
counter  as  0,  but  this  can  only  cause  it  to  perform 
an  additional  expected  0(n2)  steps  (by  Lemma 
3.2),  before  advancing  to  the  next  round5. 

The  round  field  of  any  value  w,  consists  of  two 
fields:  com  and  edge. counters.  The  com  field 
is  an  array  of  coin  counters  c;[a],o  £  {0..A}, 
with  an  added  currcnl.com  pointer  in  the  range 
{0..A'}6.  These  counters  are  used  to  maintain 
the  local  parts  of  coins  corresponding  to  the  lat¬ 
est  K  rounds  executed  by  process  t.  The  counter 
to  be  used  for  the  next  coin  of  process  i  is  de¬ 
termined  by  the  function  next  ( current.com, ),  re¬ 
turning  curreni.com,  mod  (A'  +  1).  The  edge 
counters  field  is  an  array  of  n  edge  counters  as  de¬ 
scribed  in  Subsection  4.3.  Initially  all  the  above 
are  0.  The  following  is  thus  the  bounded  imple¬ 
mentation  of  the  coin  flipping  and  round  incre¬ 
menting  operations  for  process  i. 

function  next.coin.valuc(  round)-, 
begin 

G  :=  make.graph(e1[  1  .ri]..en[l..n]); 
c[i]  :=  coin,  [next  ( current.covii )]; 
for  j  1  to  n  skip  t  do 

if  (h  0  £  G  and  w(j,  i)  <  I\  then 
c(j ]  :=  coinj[(current.comj — 

uc(j.i)  +  1)  mod  (A  -f  1)] 
else  c\j]  0  fi  od; 
return  com.ralue  (c); 
end; 

5Several  modifications  that  will  improve  the  expected 
running  time  here  and  elsewhere  in  the  algorithm  are  pos¬ 
sible,  but  are  not  introduced  for  the  sake  of  simplicity. 

6In  the  procedures  below,  fill  fields  are  first  written 
to  a  local  variable,  on  which  the  write  operation  of  the 
scannable  memory  is  then  performed. 


procedure  jhp.nr zt-coin(  round)-. 
begin 

walkstcp  ( rooitfnezt  (  currenLroin, )]); 
olid: 

function  inc(round): 
begin 

current.coin, ■  :=  next(curreuLcoini); 
coin  i[nr~t  < currenLcom,  )\  0: 

tnc.graph  (f  j  [l..n],  ,e„[l.  .n]); 
end: 

In  the  above  procedure,  note  that  a  process 
prepares,  when  advancing  to  a  new  round  the 
coin  counter  for  flipping  the  coin  in  the  next 
round. 

We  assume  t hat  processors  start  with  binary 
initial  values:  however,  the  protocol  can  be  ex¬ 
tended  tu  handle  arbitrary  initial  values.  1  et  I\ 
be  the  following  is  thus  the  consensus  algo¬ 
rithm  for  processor  i,  with  initial  value  tq.  Pro¬ 
cess  i  is  a  leader  if  for  ail  j  i.  (i,j)  is  in  G,  that 
is  having  r,  equal  to  or  dominating  all  other  rj. 
Process  i  agrees  with  process  j,  if  both  prefer  the 
same  value  r  ^  _L. 

write  ([pref :  r,-.  round:  inc(round)}) 
repeat  forever 

1 :  scan: 

2:  if  all  who  disagree 

trail  by  K  and  I'm  a  lead' ■ 

then  i  rfe  (p re/); 

3:  elseif  leaders  agree  then 
4:  write  {[pref  \  v.  round:  inc(  round)]) 

o:  elseif  pref  f:  _L  then 
6:  write  ([pref:  J_,  round:  round]) 

elseif  nert-coiii-value(rouiid)  = 
undecided  then 
7:  write  ([pref  :  _L. 

round :  flip_neii-Coin  ( round)]) 

else 

S:  write  ([pref :  next-coin-value  (round), 

round:  inc  (round)]) 

fi  fi  fi  fi; 
end: 

6  Proof  of  Correctness 

The  following  section  outlines  the  proofs  that  the 
algorithm  has  the  properties  of  consistency ,  va¬ 


lidity,  and  that  it  terminates  in  polynomial  ex¬ 
pected  time.  To  simplify  the  proofs,  the  notion 
of  a  virtual  global  round  is  introduced,  support¬ 
ing  the  illusion  that  a  process  has  an  unbounded 
and  monotonically  non-decreasing  round  num¬ 
ber,  and  that  a  unique  shared  coin  is  associated 
with  each  round. 

6.1  Virtual  Global  Rounds 

The  serializabilitv  property  ( PS)  of  scan  opera¬ 
tion  executions,  implies  that  there  is  some  linear 
ordering  on  the  scan  operation  executions  per¬ 
formed  by  all  processes.  Througnout  the  proof, 
let  denote  the  ath  scan  in  this  ordering,  if 

the  a,h  scan  is  performed  by  process  j .  denote  it 
by  S}a].  One  scan  operation  execution  is  said 
to  be  later  than  another,  if  it  is  greater  in  this 
ordering.  In  the  consensus  protocol  processes  al¬ 
ternate  between  performing  write  and  scan  oper¬ 
ations.  This  implies  that  between  any  two  scans. 

and  S'*a+1},  there  is  at  most  one  write  by 
any  process.  Denote  by  iior^a^  the  value  of  any 
variable  var  that  was  read  by  5^. 

With  each  process  i.  in  the  ath  scan,  a 
virtual  global  round  is  associated,  denoted  by 
round^'S^}).  The  definition  is  by  induction  on 
the  ordering  umong  scan  operation  executions. 

Base  case.  For  all  i.  round(i ,  S ^ )  =  0. 

Inductive  step.  Ciiven  round(i,  S^a~^).  let 

max  —  max,6{!  round(i. 

oldJeaders(S  *  1-1l)  = 

{j  |  round(i,  S  la_1l)  =  m«z}. 

and 

newJeaders(S  ^ )  = 

{j  I  j  €  old-leaders(S^a~ '))  and 

<j[l-  »]{a}U)  ^  Tu1  •"]  la_1}(/)}- 


Based  on  the  above  definitions.  define 
round(i,  S  as  follows.  If  newJea ders(S  ^  f 
0,  let  j'  G  newJeaders(S  )  and  define 

round(i,  )  = 

maz+1  i  G  newJeade rs(S  ^  ) 

max+  1—  dist(i,j *)  otherwise. 


In  casf>  the  sot  neu  Jeaders(S ^ )  =  0,  let  j’  € 
uliijt  eiders  (.s’  h'l )  and  define 

round(  i,  S  ^ )  =  max  —  dist(i,jm). 

The  above  definition  is  simply  that  if  one  of  the 
leaders  in  the  former  scan  operation  execution 
moved,  all  new  processes  are  ordered  relative  to 
it.  and  otherwise  they  are  ordered  relative  to  the 
old  leaders.  Note  that  though  the  virtual  global 
round  of  a  process  might  change  even  without 
its  performing  an  me  operation,  it  can  only  in¬ 
crease.  that  is.  the  virtual  global  round  is  a  non¬ 
decreasing  function. 

In  the  following  subsections,  a  round  means  a 
virtual  global  round  unless  otherwise  stated.  A 
process  p  is  said  to  be  in  round  r,  starting  from 
the  first  scan  operation  execution  in  which  it  was 
returned  as  being  in  r  (determined  by  applying 
the  above  definition),  and  in  all  later  scan  oper¬ 
ation  executions  until  it  is  returned  as  being  in  a 
round  r'  >  r  A  round  is  said  to  be  among  the  K 
largest  (for  some  constant  K)  starting  from  the 
earliest  scan  operation  execution  in  which  some 
process  is  in  this  round  and  no  other  process  is 
m  a  round  greater  by  A",  and  until  the  first  later 
scan  operation  execution  for  which  there  is  a  pro¬ 
cess  in  a  round  greater  by  A'. 

6.2  Consistency  and  Validity 

Though  we  have  attempted  to  maintain  the  gen¬ 
eral  structure  of  the  correctness  and  complex¬ 
ity  pro  fn  for  the  unbounded  implementation  of 
[AH88],  by  introd'.cmg  virtual  global  rounds,  the 
differences  bet-  een  our  r  unds  strip  implementa¬ 
tion  and  the  r  ile  rounds  strip  used  in  [AH88], 
force  us  tv.  •  <  iv  some  of  the  statements,  and 
to  charge  m-ii  of  the  proofs. 

For  simplicit  _  assumed  that  there  are  only 
two  poss.  de  inp-i  values,  where  v  denotes  the 
value  different  f:om  v,  for  v  6  {0,1}.  A  process 
p  prefers  v  in  round  r,  if  for  some  scan  S^aK  it  is 
the  case  that  round[p.  S  ^ )  —  r,  and  pre/p'0^  = 
v.  We  have 

Lemma  6.1.  If  process  p  prefers  v  in  round  r 
and  prefers  v  in  round  r'  >  r,  then  some  process 
q  /  p  preferred  v  in  round  r"  >  r. 


Proof  (Sketch)  By  the  algorithm,  a  process 
changes  its  preference  only  by  executing  tnc.  Let 
be  the  scan  performed  by  p  before  exe¬ 
cuting  this  tnc.  This  can  occur  only  if  some 
other  process,  say  q,  had  prefq  —  v,  and 
that  in  the  graph  returned  in  Spa\  q  has  non¬ 
negative  distance  from  p.  Since  rounds  are 
monotonically  non-decreasing,  it  is  the  case  that 
round(q,  Spa^ )  >  round(p,  Spa^ )  and  the  claim 
follows.  ■ 

The  above  lemma  and  the  code  of  the  algo¬ 
rithm  implies  the  following  two  lemmas. 

Lemma  6.2.  If  no  process  prefers  v  at  round  r 
when  round  r  is  among  the  2  largest  rounds,  then 
no  process  prefers  v  at  any  round  r'  >  r. 

Lemma  6.3.  If  no  process  prefers  v  at  round  r 
when  round  r  is  among  the  2  largest  rounds,  then 
no  process  is  busy  in  any  round  r'  >  r. 

Lemma  6.4.  If  every  process  that  completed 
round  r,  when  round  r  was  among  the  2  largest 
rounds,  preferred  v  in  round  r,  then  every  non- 
faulty  process  decides  v  by  round  r  +  1. 

Lemma  6.4  implies  validity,  since  if  all  pro¬ 
cesses  start  with  the  same  input  value  they  all 
prefer  this  value  in  round  1.  Hence  all  processes 
will  halt  at  round  2. 

Lemma  6.5.  If  any  process  decides  in  round  r, 
then  no  process  will  ever  be  in  a  round  larger  than 

i+2. 

The  above  lemma  implies  that  all  processes  will 
execute  round  r  when  it  is  among  the  2  largest 
rounds.  We  use  this  fact  to  prove  that  the  algo¬ 
rithm  has  the  consistency  property. 

Lemma  6.6.  If  some  process  decides  in  round  r 
then  all  processes  unll  decide  on  the  same  value 
by  round  r  +  1 . 

6.3  Expected  Running  Time 

A  process  is  said  to  have  selected  its  preference 
for  round  r  deterministically,  if  it.  executed  the 


corrospoiulnij;  me  m  liiu5  G.  Similarly,  a  proces¬ 
sor  is  said  to  have  selected  its  preference  for  round 
r  randomly,  if  it  executed  the  corresponding  me 
in  line  10  The  following  lemma  assures  that  all 
processors  that  select  their  preference  determin¬ 
istically.  select  the  same  value. 

Lemma  G.7.  If  processes  p  mid  q  determin¬ 
istically  It  (ted  r  and  i'.  respective  ly.  as  their 
prefe  re  net  s  ft>r  round  r.  irhtn  r  teas  among  the  2 
htrqt  >  t  rounds,  thin  i  —  i 

Hence,  one  may  talk  about  the  deterministic 
value  preferred  in  a  certain  round  The  next 
lemma  shows  that  the  scheduler  is  forced  to  de¬ 
cide  on  the  deterministic  value  of  a  round  before 
any  process  starts  flipping  a  coin  for  that  round. 

Lemma  6.8.  If  process  p  is  deterministic  in 
round  r.  and  process  q  is  randvnitccd  round 
r.  then  p  wrote  its  preference  for  round  r  before 
q  started  to  perform  flip. neTt.com. 

This  lemma  implies  that  decisions  in  different 
rounds  are  independent  events.  Thus,  the  prob¬ 
ability  of  deciding  in  any  round  is  that  of  a  se¬ 
quence  of  independent  Bernoulli  trials,  with  suc¬ 
cess  probability  e.  for  some  constant  e  >  0  (this 
follows  from  Lemmas  3.1  and  3.4).  Henc-  the 
expected  number  of  rounds  executed  before  the 
algorithm  terminates  is  constant.  As  each  shared 
coin  is  flipped  in  polynomial  expected  number  of 
steps  (Lemma  3.2).  the  algorithm  terminates  in 
a  polynomial  expected  number  of  steps. 
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